Overview

Dataset statistics

 Train DatasetTest Dataset
Number of variables1212
Number of observations712179
Missing cells695171
Missing cells (%)8.1%8.0%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory72.3 KiB18.2 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Train DatasetTest Dataset
Numeric55
Categorical44
Text33

Alerts

Train DatasetTest Dataset
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh Correlation
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh Correlation
Age has 140 (19.7%) missing values Age has 37 (20.7%) missing values Missing
Cabin has 553 (77.7%) missing values Cabin has 134 (74.9%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 484 (68.0%) zeros SibSp has 124 (69.3%) zeros Zeros
Parch has 541 (76.0%) zeros Parch has 137 (76.5%) zeros Zeros
Fare has 13 (1.8%) zeros Fare has 2 (1.1%) zeros Zeros
Alert not present in this datasetFare is highly overall correlated with PclassHigh Correlation
Alert not present in this datasetPclass is highly overall correlated with FareHigh Correlation

Reproduction

 Train DatasetTest Dataset
Analysis started2023-08-08 08:04:41.4555632023-08-08 08:04:50.192450
Analysis finished2023-08-08 08:04:50.1758142023-08-08 08:04:58.102956
Duration8.72 seconds7.91 seconds
Software versionydata-profiling vv4.4.0ydata-profiling vv4.4.0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Train DatasetTest Dataset
Distinct712179
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean448.23455437.11173
 Train DatasetTest Dataset
Minimum16
Maximum891890
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size11.1 KiB2.8 KiB
2023-08-08T10:04:58.540249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Train DatasetTest Dataset
Minimum16
5-th percentile45.149.5
Q1224.75217.5
median453.5423
Q3673.5656
95-th percentile845.9846.4
Maximum891890
Range890884
Interquartile range (IQR)448.75438.5

Descriptive statistics

 Train DatasetTest Dataset
Standard deviation256.73142260.34933
Coefficient of variation (CV)0.572761340.59561277
Kurtosis-1.2053753-1.1647939
Mean448.23455437.11173
Median Absolute Deviation (MAD)224.5212
Skewness-0.0273401150.10807588
Sum31914378243
Variance65911.02467781.774
MonotonicityNot monotonicNot monotonic
2023-08-08T10:04:59.063515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
332 1
 
0.1%
673 1
 
0.1%
610 1
 
0.1%
280 1
 
0.1%
294 1
 
0.1%
401 1
 
0.1%
123 1
 
0.1%
184 1
 
0.1%
203 1
 
0.1%
439 1
 
0.1%
Other values (702) 702
98.6%
ValueCountFrequency (%)
710 1
 
0.6%
600 1
 
0.6%
528 1
 
0.6%
877 1
 
0.6%
97 1
 
0.6%
293 1
 
0.6%
324 1
 
0.6%
737 1
 
0.6%
530 1
 
0.6%
219 1
 
0.6%
Other values (169) 169
94.4%
ValueCountFrequency (%)
1 1
0.1%
2 1
0.1%
3 1
0.1%
4 1
0.1%
5 1
0.1%
7 1
0.1%
8 1
0.1%
9 1
0.1%
10 1
0.1%
12 1
0.1%
ValueCountFrequency (%)
6 1
0.6%
11 1
0.6%
24 1
0.6%
26 1
0.6%
31 1
0.6%
32 1
0.6%
34 1
0.6%
40 1
0.6%
45 1
0.6%
50 1
0.6%
ValueCountFrequency (%)
6 1
0.1%
11 1
0.1%
24 1
0.1%
26 1
0.1%
31 1
0.1%
32 1
0.1%
34 1
0.1%
40 1
0.1%
45 1
0.1%
50 1
0.1%
ValueCountFrequency (%)
1 1
0.6%
2 1
0.6%
3 1
0.6%
4 1
0.6%
5 1
0.6%
7 1
0.6%
8 1
0.6%
9 1
0.6%
10 1
0.6%
12 1
0.6%

Pclass
Categorical

 Train DatasetTest Dataset
Distinct33
Distinct (%)0.4%1.7%
Missing00
Missing (%)0.0%0.0%
Memory size11.1 KiB2.8 KiB
3
398 
1
163 
2
151 
3
93 
1
53 
2
33 

Length

 Train DatasetTest Dataset
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Train DatasetTest Dataset
Total characters712179
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Train DatasetTest Dataset
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Train DatasetTest Dataset
1st row13
2nd row22
3rd row33
4th row32
5th row33

Common Values

ValueCountFrequency (%)
3 398
55.9%
1 163
22.9%
2 151
 
21.2%
ValueCountFrequency (%)
3 93
52.0%
1 53
29.6%
2 33
 
18.4%

Length

2023-08-08T10:04:59.578992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Train Dataset

2023-08-08T10:04:59.922426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:05:00.206718image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
3 398
55.9%
1 163
22.9%
2 151
 
21.2%
ValueCountFrequency (%)
3 93
52.0%
1 53
29.6%
2 33
 
18.4%

Most occurring characters

ValueCountFrequency (%)
3 398
55.9%
1 163
22.9%
2 151
 
21.2%
ValueCountFrequency (%)
3 93
52.0%
1 53
29.6%
2 33
 
18.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 712
100.0%
ValueCountFrequency (%)
Decimal Number 179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 398
55.9%
1 163
22.9%
2 151
 
21.2%
ValueCountFrequency (%)
3 93
52.0%
1 53
29.6%
2 33
 
18.4%

Most occurring scripts

ValueCountFrequency (%)
Common 712
100.0%
ValueCountFrequency (%)
Common 179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 398
55.9%
1 163
22.9%
2 151
 
21.2%
ValueCountFrequency (%)
3 93
52.0%
1 53
29.6%
2 33
 
18.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 712
100.0%
ValueCountFrequency (%)
ASCII 179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 398
55.9%
1 163
22.9%
2 151
 
21.2%
ValueCountFrequency (%)
3 93
52.0%
1 53
29.6%
2 33
 
18.4%

Name
['Text', 'Text']

 Train DatasetTest Dataset
Distinct712179
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size11.1 KiB2.8 KiB
2023-08-08T10:05:01.220149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

 Train DatasetTest Dataset
Max length8261
Median length5245
Mean length26.76825827.748603
Min length1214

Characters and Unicode

 Train DatasetTest Dataset
Total characters190594967
Distinct characters6057
Distinct categories77 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Train DatasetTest Dataset
Unique712179 ?
Unique (%)100.0%100.0%

Sample

 Train DatasetTest Dataset
1st rowPartner, Mr. AustenMoubarek, Master. Halim Gonios ("William George")
2nd rowBerriman, Mr. William JohnKvillner, Mr. Johan Henrik Johannesson
3rd rowTikkanen, Mr. JuhoAlhomaki, Mr. Ilmari Rudolf
4th rowHansen, Mr. Henrik JuulHarper, Miss. Annie Jessie "Nina"
5th rowAndersson, Miss. Ebba Iris AlfridaNicola-Yarred, Miss. Jamila
ValueCountFrequency (%)
mr 421
 
14.6%
miss 143
 
5.0%
mrs 100
 
3.5%
william 52
 
1.8%
john 36
 
1.3%
master 33
 
1.1%
henry 32
 
1.1%
charles 20
 
0.7%
thomas 20
 
0.7%
george 16
 
0.6%
Other values (1260) 2006
69.7%
ValueCountFrequency (%)
mr 100
 
13.4%
miss 39
 
5.2%
mrs 29
 
3.9%
william 12
 
1.6%
george 8
 
1.1%
james 8
 
1.1%
john 8
 
1.1%
master 7
 
0.9%
mary 5
 
0.7%
margaret 5
 
0.7%
Other values (435) 524
70.3%
2023-08-08T10:05:02.773489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2169
 
11.4%
r 1551
 
8.1%
e 1328
 
7.0%
a 1324
 
6.9%
i 1065
 
5.6%
s 1041
 
5.5%
n 1024
 
5.4%
M 898
 
4.7%
l 841
 
4.4%
o 802
 
4.2%
Other values (50) 7016
36.8%
ValueCountFrequency (%)
566
 
11.4%
r 407
 
8.2%
e 375
 
7.5%
a 333
 
6.7%
n 280
 
5.6%
i 260
 
5.2%
s 256
 
5.2%
M 230
 
4.6%
l 226
 
4.6%
o 206
 
4.1%
Other values (47) 1828
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 12269
64.4%
Uppercase Letter 2893
 
15.2%
Space Separator 2169
 
11.4%
Other Punctuation 1504
 
7.9%
Open Punctuation 107
 
0.6%
Close Punctuation 107
 
0.6%
Dash Punctuation 10
 
0.1%
ValueCountFrequency (%)
Lowercase Letter 3177
64.0%
Uppercase Letter 752
 
15.1%
Space Separator 566
 
11.4%
Other Punctuation 395
 
8.0%
Close Punctuation 37
 
0.7%
Open Punctuation 37
 
0.7%
Dash Punctuation 3
 
0.1%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
2169
100.0%
ValueCountFrequency (%)
566
100.0%
Lowercase Letter
ValueCountFrequency (%)
r 1551
12.6%
e 1328
10.8%
a 1324
10.8%
i 1065
8.7%
s 1041
8.5%
n 1024
8.3%
l 841
 
6.9%
o 802
 
6.5%
t 527
 
4.3%
h 412
 
3.4%
Other values (16) 2354
19.2%
ValueCountFrequency (%)
r 407
12.8%
e 375
11.8%
a 333
10.5%
n 280
8.8%
i 260
8.2%
s 256
8.1%
l 226
 
7.1%
o 206
 
6.5%
t 140
 
4.4%
h 105
 
3.3%
Other values (15) 589
18.5%
Uppercase Letter
ValueCountFrequency (%)
M 898
31.0%
A 195
 
6.7%
J 169
 
5.8%
H 162
 
5.6%
S 146
 
5.0%
C 131
 
4.5%
E 124
 
4.3%
W 119
 
4.1%
B 113
 
3.9%
L 109
 
3.8%
Other values (15) 727
25.1%
ValueCountFrequency (%)
M 230
30.6%
A 55
 
7.3%
J 46
 
6.1%
E 42
 
5.6%
C 41
 
5.5%
H 41
 
5.5%
S 34
 
4.5%
G 28
 
3.7%
F 27
 
3.6%
B 27
 
3.6%
Other values (14) 181
24.1%
Other Punctuation
ValueCountFrequency (%)
. 713
47.4%
, 712
47.3%
" 70
 
4.7%
' 8
 
0.5%
/ 1
 
0.1%
ValueCountFrequency (%)
. 179
45.3%
, 179
45.3%
" 36
 
9.1%
' 1
 
0.3%
Open Punctuation
ValueCountFrequency (%)
( 107
100.0%
ValueCountFrequency (%)
( 37
100.0%
Close Punctuation
ValueCountFrequency (%)
) 107
100.0%
ValueCountFrequency (%)
) 37
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 10
100.0%
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 15162
79.6%
Common 3897
 
20.4%
ValueCountFrequency (%)
Latin 3929
79.1%
Common 1038
 
20.9%

Most frequent character per script

Common
ValueCountFrequency (%)
2169
55.7%
. 713
 
18.3%
, 712
 
18.3%
( 107
 
2.7%
) 107
 
2.7%
" 70
 
1.8%
- 10
 
0.3%
' 8
 
0.2%
/ 1
 
< 0.1%
ValueCountFrequency (%)
566
54.5%
. 179
 
17.2%
, 179
 
17.2%
) 37
 
3.6%
( 37
 
3.6%
" 36
 
3.5%
- 3
 
0.3%
' 1
 
0.1%
Latin
ValueCountFrequency (%)
r 1551
 
10.2%
e 1328
 
8.8%
a 1324
 
8.7%
i 1065
 
7.0%
s 1041
 
6.9%
n 1024
 
6.8%
M 898
 
5.9%
l 841
 
5.5%
o 802
 
5.3%
t 527
 
3.5%
Other values (41) 4761
31.4%
ValueCountFrequency (%)
r 407
 
10.4%
e 375
 
9.5%
a 333
 
8.5%
n 280
 
7.1%
i 260
 
6.6%
s 256
 
6.5%
M 230
 
5.9%
l 226
 
5.8%
o 206
 
5.2%
t 140
 
3.6%
Other values (39) 1216
30.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 19059
100.0%
ValueCountFrequency (%)
ASCII 4967
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2169
 
11.4%
r 1551
 
8.1%
e 1328
 
7.0%
a 1324
 
6.9%
i 1065
 
5.6%
s 1041
 
5.5%
n 1024
 
5.4%
M 898
 
4.7%
l 841
 
4.4%
o 802
 
4.2%
Other values (50) 7016
36.8%
ValueCountFrequency (%)
566
 
11.4%
r 407
 
8.2%
e 375
 
7.5%
a 333
 
6.7%
n 280
 
5.6%
i 260
 
5.2%
s 256
 
5.2%
M 230
 
4.6%
l 226
 
4.6%
o 206
 
4.1%
Other values (47) 1828
36.8%

Sex
Categorical

 Train DatasetTest Dataset
Distinct22
Distinct (%)0.3%1.1%
Missing00
Missing (%)0.0%0.0%
Memory size11.1 KiB2.8 KiB
male
467 
female
245 
male
110 
female
69 

Length

 Train DatasetTest Dataset
Max length66
Median length44
Mean length4.68820224.7709497
Min length44

Characters and Unicode

 Train DatasetTest Dataset
Total characters3338854
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Train DatasetTest Dataset
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Train DatasetTest Dataset
1st rowmalemale
2nd rowmalemale
3rd rowmalemale
4th rowmalefemale
5th rowfemalefemale

Common Values

ValueCountFrequency (%)
male 467
65.6%
female 245
34.4%
ValueCountFrequency (%)
male 110
61.5%
female 69
38.5%

Length

2023-08-08T10:05:03.147692image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Train Dataset

2023-08-08T10:05:03.407272image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:05:03.616717image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
male 467
65.6%
female 245
34.4%
ValueCountFrequency (%)
male 110
61.5%
female 69
38.5%

Most occurring characters

ValueCountFrequency (%)
e 957
28.7%
m 712
21.3%
a 712
21.3%
l 712
21.3%
f 245
 
7.3%
ValueCountFrequency (%)
e 248
29.0%
m 179
21.0%
a 179
21.0%
l 179
21.0%
f 69
 
8.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3338
100.0%
ValueCountFrequency (%)
Lowercase Letter 854
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 957
28.7%
m 712
21.3%
a 712
21.3%
l 712
21.3%
f 245
 
7.3%
ValueCountFrequency (%)
e 248
29.0%
m 179
21.0%
a 179
21.0%
l 179
21.0%
f 69
 
8.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 3338
100.0%
ValueCountFrequency (%)
Latin 854
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 957
28.7%
m 712
21.3%
a 712
21.3%
l 712
21.3%
f 245
 
7.3%
ValueCountFrequency (%)
e 248
29.0%
m 179
21.0%
a 179
21.0%
l 179
21.0%
f 69
 
8.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3338
100.0%
ValueCountFrequency (%)
ASCII 854
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 957
28.7%
m 712
21.3%
a 712
21.3%
l 712
21.3%
f 245
 
7.3%
ValueCountFrequency (%)
e 248
29.0%
m 179
21.0%
a 179
21.0%
l 179
21.0%
f 69
 
8.1%

Age
Real number (ℝ)

 Train DatasetTest Dataset
Distinct8356
Distinct (%)14.5%39.4%
Missing14037
Missing (%)19.7%20.7%
Infinite00
Infinite (%)0.0%0.0%
Mean29.49884630.505845
 Train DatasetTest Dataset
Minimum0.420.83
Maximum8071
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size11.1 KiB2.8 KiB
2023-08-08T10:05:04.017229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Train DatasetTest Dataset
Minimum0.420.83
5-th percentile3.559
Q12120
median2829
Q33838.75
95-th percentile55.22560.85
Maximum8071
Range79.5870.17
Interquartile range (IQR)1718.75

Descriptive statistics

 Train DatasetTest Dataset
Standard deviation14.50005914.656239
Coefficient of variation (CV)0.491546650.48044036
Kurtosis0.149233380.27728386
Mean29.49884630.505845
Median Absolute Deviation (MAD)89
Skewness0.331001740.62293002
Sum16873.344331.83
Variance210.25171214.80535
MonotonicityNot monotonicNot monotonic
2023-08-08T10:05:04.480678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24 26
 
3.7%
22 23
 
3.2%
28 21
 
2.9%
25 21
 
2.9%
18 20
 
2.8%
30 20
 
2.8%
19 19
 
2.7%
21 19
 
2.7%
29 16
 
2.2%
36 15
 
2.1%
Other values (73) 372
52.2%
(Missing) 140
 
19.7%
ValueCountFrequency (%)
36 7
 
3.9%
19 6
 
3.4%
18 6
 
3.4%
23 5
 
2.8%
21 5
 
2.8%
16 5
 
2.8%
30 5
 
2.8%
40 5
 
2.8%
24 4
 
2.2%
35 4
 
2.2%
Other values (46) 90
50.3%
(Missing) 37
20.7%
ValueCountFrequency (%)
0.42 1
 
0.1%
0.67 1
 
0.1%
0.75 2
 
0.3%
0.83 1
 
0.1%
0.92 1
 
0.1%
1 7
1.0%
2 10
1.4%
3 6
0.8%
4 8
1.1%
5 2
 
0.3%
ValueCountFrequency (%)
0.83 1
 
0.6%
4 2
1.1%
5 2
1.1%
6 1
 
0.6%
9 3
1.7%
10 1
 
0.6%
11 1
 
0.6%
13 1
 
0.6%
14 1
 
0.6%
15 1
 
0.6%
ValueCountFrequency (%)
0.83 1
 
0.1%
4 2
0.3%
5 2
0.3%
6 1
 
0.1%
9 3
0.4%
10 1
 
0.1%
11 1
 
0.1%
13 1
 
0.1%
14 1
 
0.1%
15 1
 
0.1%
ValueCountFrequency (%)
0.42 1
 
0.6%
0.67 1
 
0.6%
0.75 2
 
1.1%
0.83 1
 
0.6%
0.92 1
 
0.6%
1 7
3.9%
2 10
5.6%
3 6
3.4%
4 8
4.5%
5 2
 
1.1%

SibSp
Real number (ℝ)

 Train DatasetTest Dataset
Distinct75
Distinct (%)1.0%2.8%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.553370790.40223464
 Train DatasetTest Dataset
Minimum00
Maximum84
Zeros484124
Zeros (%)68.0%69.3%
Negative00
Negative (%)0.0%0.0%
Memory size11.1 KiB2.8 KiB
2023-08-08T10:05:04.805885image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Train DatasetTest Dataset
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile32
Maximum84
Range84
Interquartile range (IQR)11

Descriptive statistics

 Train DatasetTest Dataset
Standard deviation1.17640420.73070347
Coefficient of variation (CV)2.12588771.81661
Kurtosis16.5057347.4108164
Mean0.553370790.40223464
Median Absolute Deviation (MAD)00
Skewness3.61938512.4416505
Sum39472
Variance1.38392670.53392756
MonotonicityNot monotonicNot monotonic
2023-08-08T10:05:05.079565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 484
68.0%
1 164
 
23.0%
2 23
 
3.2%
4 16
 
2.2%
3 13
 
1.8%
8 7
 
1.0%
5 5
 
0.7%
ValueCountFrequency (%)
0 124
69.3%
1 45
 
25.1%
2 5
 
2.8%
3 3
 
1.7%
4 2
 
1.1%
ValueCountFrequency (%)
0 484
68.0%
1 164
 
23.0%
2 23
 
3.2%
3 13
 
1.8%
4 16
 
2.2%
5 5
 
0.7%
8 7
 
1.0%
ValueCountFrequency (%)
0 124
69.3%
1 45
 
25.1%
2 5
 
2.8%
3 3
 
1.7%
4 2
 
1.1%
ValueCountFrequency (%)
0 124
17.4%
1 45
 
6.3%
2 5
 
0.7%
3 3
 
0.4%
4 2
 
0.3%
ValueCountFrequency (%)
0 484
270.4%
1 164
 
91.6%
2 23
 
12.8%
3 13
 
7.3%
4 16
 
8.9%
5 5
 
2.8%
8 7
 
3.9%

Parch
Real number (ℝ)

 Train DatasetTest Dataset
Distinct76
Distinct (%)1.0%3.4%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.379213480.39106145
 Train DatasetTest Dataset
Minimum00
Maximum65
Zeros541137
Zeros (%)76.0%76.5%
Negative00
Negative (%)0.0%0.0%
Memory size11.1 KiB2.8 KiB
2023-08-08T10:05:05.341547image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Train DatasetTest Dataset
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum65
Range65
Interquartile range (IQR)00

Descriptive statistics

 Train DatasetTest Dataset
Standard deviation0.791669320.86318491
Coefficient of variation (CV)2.08766132.2072871
Kurtosis9.663402510.119759
Mean0.379213480.39106145
Median Absolute Deviation (MAD)00
Skewness2.6954592.9125135
Sum27070
Variance0.626740310.74508819
MonotonicityNot monotonicNot monotonic
2023-08-08T10:05:05.612880image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 541
76.0%
1 94
 
13.2%
2 67
 
9.4%
4 3
 
0.4%
3 3
 
0.4%
5 3
 
0.4%
6 1
 
0.1%
ValueCountFrequency (%)
0 137
76.5%
1 24
 
13.4%
2 13
 
7.3%
3 2
 
1.1%
5 2
 
1.1%
4 1
 
0.6%
ValueCountFrequency (%)
0 541
76.0%
1 94
 
13.2%
2 67
 
9.4%
3 3
 
0.4%
4 3
 
0.4%
5 3
 
0.4%
6 1
 
0.1%
ValueCountFrequency (%)
0 137
76.5%
1 24
 
13.4%
2 13
 
7.3%
3 2
 
1.1%
4 1
 
0.6%
5 2
 
1.1%
ValueCountFrequency (%)
0 137
19.2%
1 24
 
3.4%
2 13
 
1.8%
3 2
 
0.3%
4 1
 
0.1%
5 2
 
0.3%
ValueCountFrequency (%)
0 541
302.2%
1 94
 
52.5%
2 67
 
37.4%
3 3
 
1.7%
4 3
 
1.7%
5 3
 
1.7%
6 1
 
0.6%

Ticket
['Text', 'Text']

 Train DatasetTest Dataset
Distinct558169
Distinct (%)78.4%94.4%
Missing00
Missing (%)0.0%0.0%
Memory size11.1 KiB2.8 KiB
2023-08-08T10:05:06.670241image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

 Train DatasetTest Dataset
Max length1818
Median length1717
Mean length6.76685396.6871508
Min length33

Characters and Unicode

 Train DatasetTest Dataset
Total characters48181197
Distinct characters3528
Distinct categories55 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Train DatasetTest Dataset
Unique458160 ?
Unique (%)64.3%89.4%

Sample

 Train DatasetTest Dataset
1st row1130432661
2nd row28425C.A. 18723
3rd rowSTON/O 2. 3101293SOTON/O2 3101287
4th row350025248727
5th row3470822651
ValueCountFrequency (%)
pc 42
 
4.7%
c.a 22
 
2.4%
ca 14
 
1.6%
a/5 14
 
1.6%
2 10
 
1.1%
ston/o 10
 
1.1%
sc/paris 8
 
0.9%
soton/oq 7
 
0.8%
2343 7
 
0.8%
347082 6
 
0.7%
Other values (585) 763
84.5%
ValueCountFrequency (%)
pc 18
 
7.9%
c.a 5
 
2.2%
soton/o.q 4
 
1.8%
347088 3
 
1.3%
a/5 3
 
1.3%
w./c 3
 
1.3%
2661 2
 
0.9%
ston/o 2
 
0.9%
17485 2
 
0.9%
2 2
 
0.9%
Other values (176) 183
80.6%
2023-08-08T10:05:08.170769image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 607
12.6%
1 553
11.5%
2 493
10.2%
7 383
 
7.9%
4 365
 
7.6%
0 330
 
6.8%
6 329
 
6.8%
5 317
 
6.6%
9 252
 
5.2%
8 220
 
4.6%
Other values (25) 969
20.1%
ValueCountFrequency (%)
3 139
11.6%
1 136
11.4%
7 107
8.9%
2 101
8.4%
4 99
8.3%
6 93
 
7.8%
9 76
 
6.3%
0 76
 
6.3%
5 70
 
5.8%
8 62
 
5.2%
Other values (18) 238
19.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3849
79.9%
Uppercase Letter 525
 
10.9%
Other Punctuation 236
 
4.9%
Space Separator 191
 
4.0%
Lowercase Letter 17
 
0.4%
ValueCountFrequency (%)
Decimal Number 959
80.1%
Uppercase Letter 127
 
10.6%
Other Punctuation 59
 
4.9%
Space Separator 48
 
4.0%
Lowercase Letter 4
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 607
15.8%
1 553
14.4%
2 493
12.8%
7 383
10.0%
4 365
9.5%
0 330
8.6%
6 329
8.5%
5 317
8.2%
9 252
6.5%
8 220
 
5.7%
ValueCountFrequency (%)
3 139
14.5%
1 136
14.2%
7 107
11.2%
2 101
10.5%
4 99
10.3%
6 93
9.7%
9 76
7.9%
0 76
7.9%
5 70
7.3%
8 62
6.5%
Space Separator
ValueCountFrequency (%)
191
100.0%
ValueCountFrequency (%)
48
100.0%
Other Punctuation
ValueCountFrequency (%)
. 156
66.1%
/ 80
33.9%
ValueCountFrequency (%)
. 41
69.5%
/ 18
30.5%
Uppercase Letter
ValueCountFrequency (%)
C 118
22.5%
O 77
14.7%
P 74
14.1%
A 72
13.7%
S 60
11.4%
N 33
 
6.3%
T 29
 
5.5%
W 13
 
2.5%
Q 11
 
2.1%
I 11
 
2.1%
Other values (6) 27
 
5.1%
ValueCountFrequency (%)
C 33
26.0%
P 24
18.9%
O 23
18.1%
S 14
11.0%
A 10
 
7.9%
T 7
 
5.5%
N 7
 
5.5%
Q 4
 
3.1%
W 3
 
2.4%
H 1
 
0.8%
Lowercase Letter
ValueCountFrequency (%)
a 5
29.4%
s 4
23.5%
i 3
17.6%
r 3
17.6%
l 1
 
5.9%
e 1
 
5.9%
ValueCountFrequency (%)
a 1
25.0%
r 1
25.0%
i 1
25.0%
s 1
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4276
88.8%
Latin 542
 
11.2%
ValueCountFrequency (%)
Common 1066
89.1%
Latin 131
 
10.9%

Most frequent character per script

Common
ValueCountFrequency (%)
3 607
14.2%
1 553
12.9%
2 493
11.5%
7 383
9.0%
4 365
8.5%
0 330
7.7%
6 329
7.7%
5 317
7.4%
9 252
5.9%
8 220
 
5.1%
Other values (3) 427
10.0%
ValueCountFrequency (%)
3 139
13.0%
1 136
12.8%
7 107
10.0%
2 101
9.5%
4 99
9.3%
6 93
8.7%
9 76
7.1%
0 76
7.1%
5 70
6.6%
8 62
5.8%
Other values (3) 107
10.0%
Latin
ValueCountFrequency (%)
C 118
21.8%
O 77
14.2%
P 74
13.7%
A 72
13.3%
S 60
11.1%
N 33
 
6.1%
T 29
 
5.4%
W 13
 
2.4%
Q 11
 
2.0%
I 11
 
2.0%
Other values (12) 44
 
8.1%
ValueCountFrequency (%)
C 33
25.2%
P 24
18.3%
O 23
17.6%
S 14
10.7%
A 10
 
7.6%
T 7
 
5.3%
N 7
 
5.3%
Q 4
 
3.1%
W 3
 
2.3%
a 1
 
0.8%
Other values (5) 5
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4818
100.0%
ValueCountFrequency (%)
ASCII 1197
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 607
12.6%
1 553
11.5%
2 493
10.2%
7 383
 
7.9%
4 365
 
7.6%
0 330
 
6.8%
6 329
 
6.8%
5 317
 
6.6%
9 252
 
5.2%
8 220
 
4.6%
Other values (25) 969
20.1%
ValueCountFrequency (%)
3 139
11.6%
1 136
11.4%
7 107
8.9%
2 101
8.4%
4 99
8.3%
6 93
 
7.8%
9 76
 
6.3%
0 76
 
6.3%
5 70
 
5.8%
8 62
 
5.2%
Other values (18) 238
19.9%

Fare
Real number (ℝ)

 Train DatasetTest Dataset
Distinct220107
Distinct (%)30.9%59.8%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean32.58627630.684473
 Train DatasetTest Dataset
Minimum00
Maximum512.3292262.375
Zeros132
Zeros (%)1.8%1.1%
Negative00
Negative (%)0.0%0.0%
Memory size11.1 KiB2.8 KiB
2023-08-08T10:05:08.611324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Train DatasetTest Dataset
Minimum00
5-th percentile7.2257.215
Q17.9257.8958
median14.454214.5
Q330.532.4104
95-th percentile116.3012595.23833
Maximum512.3292262.375
Range512.3292262.375
Interquartile range (IQR)22.57524.5146

Descriptive statistics

 Train DatasetTest Dataset
Standard deviation51.96952939.447725
Coefficient of variation (CV)1.59482871.2855924
Kurtosis33.67953513.842715
Mean32.58627630.684473
Median Absolute Deviation (MAD)6.80427.25
Skewness4.87506563.2942177
Sum23201.4295492.5207
Variance2700.8321556.123
MonotonicityNot monotonicNot monotonic
2023-08-08T10:05:09.033704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 35
 
4.9%
13 33
 
4.6%
7.8958 32
 
4.5%
7.75 26
 
3.7%
26 25
 
3.5%
10.5 17
 
2.4%
7.925 17
 
2.4%
0 13
 
1.8%
7.2292 13
 
1.8%
8.6625 13
 
1.8%
Other values (210) 488
68.5%
ValueCountFrequency (%)
13 9
 
5.0%
7.75 8
 
4.5%
8.05 8
 
4.5%
10.5 7
 
3.9%
26 6
 
3.4%
7.8958 6
 
3.4%
7.8542 4
 
2.2%
7.05 4
 
2.2%
15.2458 3
 
1.7%
7.775 3
 
1.7%
Other values (97) 121
67.6%
ValueCountFrequency (%)
0 13
1.8%
4.0125 1
 
0.1%
5 1
 
0.1%
6.2375 1
 
0.1%
6.4375 1
 
0.1%
6.45 1
 
0.1%
6.4958 2
 
0.3%
6.75 2
 
0.3%
6.8583 1
 
0.1%
6.95 1
 
0.1%
ValueCountFrequency (%)
0 2
1.1%
7.0458 1
 
0.6%
7.05 4
2.2%
7.125 2
1.1%
7.225 2
1.1%
7.2292 2
1.1%
7.25 3
1.7%
7.4958 1
 
0.6%
7.55 2
1.1%
7.7292 1
 
0.6%
ValueCountFrequency (%)
0 2
0.3%
7.0458 1
 
0.1%
7.05 4
0.6%
7.125 2
0.3%
7.225 2
0.3%
7.2292 2
0.3%
7.25 3
0.4%
7.4958 1
 
0.1%
7.55 2
0.3%
7.7292 1
 
0.1%
ValueCountFrequency (%)
0 13
7.3%
4.0125 1
 
0.6%
5 1
 
0.6%
6.2375 1
 
0.6%
6.4375 1
 
0.6%
6.45 1
 
0.6%
6.4958 2
 
1.1%
6.75 2
 
1.1%
6.8583 1
 
0.6%
6.95 1
 
0.6%

Cabin
['Text', 'Text']

 Train DatasetTest Dataset
Distinct11742
Distinct (%)73.6%93.3%
Missing553134
Missing (%)77.7%74.9%
Memory size11.1 KiB2.8 KiB
2023-08-08T10:05:09.978003image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

 Train DatasetTest Dataset
Max length1515
Median length33
Mean length3.62893083.4444444
Min length11

Characters and Unicode

 Train DatasetTest Dataset
Total characters577155
Distinct characters1918
Distinct categories33 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Train DatasetTest Dataset
Unique8239 ?
Unique (%)51.6%86.7%

Sample

 Train DatasetTest Dataset
1st rowC124D47
2nd rowB58 B60C123
3rd rowB38D28
4th rowC52D19
5th rowC93C110
ValueCountFrequency (%)
c23 4
 
2.1%
c27 4
 
2.1%
c25 4
 
2.1%
f 4
 
2.1%
c26 3
 
1.6%
e101 3
 
1.6%
f2 3
 
1.6%
c22 3
 
1.6%
g6 3
 
1.6%
b98 3
 
1.6%
Other values (120) 153
81.8%
ValueCountFrequency (%)
c126 2
 
3.9%
e25 2
 
3.9%
d 2
 
3.9%
e34 1
 
2.0%
f33 1
 
2.0%
d19 1
 
2.0%
c110 1
 
2.0%
a6 1
 
2.0%
d48 1
 
2.0%
b69 1
 
2.0%
Other values (38) 38
74.5%
2023-08-08T10:05:11.282480image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 64
11.1%
2 61
10.6%
B 50
 
8.7%
3 48
 
8.3%
1 48
 
8.3%
6 36
 
6.2%
5 36
 
6.2%
4 31
 
5.4%
8 29
 
5.0%
28
 
4.9%
Other values (9) 146
25.3%
ValueCountFrequency (%)
6 15
 
9.7%
D 15
 
9.7%
B 14
 
9.0%
1 13
 
8.4%
2 11
 
7.1%
3 11
 
7.1%
9 10
 
6.5%
5 9
 
5.8%
E 8
 
5.2%
7 8
 
5.2%
Other values (8) 41
26.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 362
62.7%
Uppercase Letter 187
32.4%
Space Separator 28
 
4.9%
ValueCountFrequency (%)
Decimal Number 98
63.2%
Uppercase Letter 51
32.9%
Space Separator 6
 
3.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 64
34.2%
B 50
26.7%
E 25
 
13.4%
D 19
 
10.2%
F 12
 
6.4%
A 10
 
5.3%
G 6
 
3.2%
T 1
 
0.5%
ValueCountFrequency (%)
D 15
29.4%
B 14
27.5%
E 8
15.7%
C 7
13.7%
A 5
 
9.8%
F 1
 
2.0%
G 1
 
2.0%
Decimal Number
ValueCountFrequency (%)
2 61
16.9%
3 48
13.3%
1 48
13.3%
6 36
9.9%
5 36
9.9%
4 31
8.6%
8 29
8.0%
7 26
7.2%
0 24
 
6.6%
9 23
 
6.4%
ValueCountFrequency (%)
6 15
15.3%
1 13
13.3%
2 11
11.2%
3 11
11.2%
9 10
10.2%
5 9
9.2%
7 8
8.2%
8 8
8.2%
0 7
7.1%
4 6
 
6.1%
Space Separator
ValueCountFrequency (%)
28
100.0%
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 390
67.6%
Latin 187
32.4%
ValueCountFrequency (%)
Common 104
67.1%
Latin 51
32.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 64
34.2%
B 50
26.7%
E 25
 
13.4%
D 19
 
10.2%
F 12
 
6.4%
A 10
 
5.3%
G 6
 
3.2%
T 1
 
0.5%
ValueCountFrequency (%)
D 15
29.4%
B 14
27.5%
E 8
15.7%
C 7
13.7%
A 5
 
9.8%
F 1
 
2.0%
G 1
 
2.0%
Common
ValueCountFrequency (%)
2 61
15.6%
3 48
12.3%
1 48
12.3%
6 36
9.2%
5 36
9.2%
4 31
7.9%
8 29
7.4%
28
7.2%
7 26
6.7%
0 24
 
6.2%
ValueCountFrequency (%)
6 15
14.4%
1 13
12.5%
2 11
10.6%
3 11
10.6%
9 10
9.6%
5 9
8.7%
7 8
7.7%
8 8
7.7%
0 7
6.7%
6
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 577
100.0%
ValueCountFrequency (%)
ASCII 155
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 64
11.1%
2 61
10.6%
B 50
 
8.7%
3 48
 
8.3%
1 48
 
8.3%
6 36
 
6.2%
5 36
 
6.2%
4 31
 
5.4%
8 29
 
5.0%
28
 
4.9%
Other values (9) 146
25.3%
ValueCountFrequency (%)
6 15
 
9.7%
D 15
 
9.7%
B 14
 
9.0%
1 13
 
8.4%
2 11
 
7.1%
3 11
 
7.1%
9 10
 
6.5%
5 9
 
5.8%
E 8
 
5.2%
7 8
 
5.2%
Other values (8) 41
26.5%

Embarked
Categorical

 Train DatasetTest Dataset
Distinct33
Distinct (%)0.4%1.7%
Missing20
Missing (%)0.3%0.0%
Memory size11.1 KiB2.8 KiB
S
525 
C
125 
Q
60 
S
119 
C
43 
Q
17 

Length

 Train DatasetTest Dataset
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Train DatasetTest Dataset
Total characters710179
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Train DatasetTest Dataset
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Train DatasetTest Dataset
1st rowSC
2nd rowSS
3rd rowSS
4th rowSS
5th rowSC

Common Values

ValueCountFrequency (%)
S 525
73.7%
C 125
 
17.6%
Q 60
 
8.4%
(Missing) 2
 
0.3%
ValueCountFrequency (%)
S 119
66.5%
C 43
 
24.0%
Q 17
 
9.5%

Length

2023-08-08T10:05:11.631088image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Train Dataset

2023-08-08T10:05:11.866288image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:05:12.100665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
s 525
73.9%
c 125
 
17.6%
q 60
 
8.5%
ValueCountFrequency (%)
s 119
66.5%
c 43
 
24.0%
q 17
 
9.5%

Most occurring characters

ValueCountFrequency (%)
S 525
73.9%
C 125
 
17.6%
Q 60
 
8.5%
ValueCountFrequency (%)
S 119
66.5%
C 43
 
24.0%
Q 17
 
9.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 710
100.0%
ValueCountFrequency (%)
Uppercase Letter 179
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 525
73.9%
C 125
 
17.6%
Q 60
 
8.5%
ValueCountFrequency (%)
S 119
66.5%
C 43
 
24.0%
Q 17
 
9.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 710
100.0%
ValueCountFrequency (%)
Latin 179
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 525
73.9%
C 125
 
17.6%
Q 60
 
8.5%
ValueCountFrequency (%)
S 119
66.5%
C 43
 
24.0%
Q 17
 
9.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 710
100.0%
ValueCountFrequency (%)
ASCII 179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 525
73.9%
C 125
 
17.6%
Q 60
 
8.5%
ValueCountFrequency (%)
S 119
66.5%
C 43
 
24.0%
Q 17
 
9.5%

Survived
Categorical

 Train DatasetTest Dataset
Distinct22
Distinct (%)0.3%1.1%
Missing00
Missing (%)0.0%0.0%
Memory size11.1 KiB2.8 KiB
0
444 
1
268 
0
105 
1
74 

Length

 Train DatasetTest Dataset
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Train DatasetTest Dataset
Total characters712179
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Train DatasetTest Dataset
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Train DatasetTest Dataset
1st row01
2nd row00
3rd row00
4th row01
5th row01

Common Values

ValueCountFrequency (%)
0 444
62.4%
1 268
37.6%
ValueCountFrequency (%)
0 105
58.7%
1 74
41.3%

Length

2023-08-08T10:05:12.350428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Train Dataset

2023-08-08T10:05:12.582412image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:05:12.797595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 444
62.4%
1 268
37.6%
ValueCountFrequency (%)
0 105
58.7%
1 74
41.3%

Most occurring characters

ValueCountFrequency (%)
0 444
62.4%
1 268
37.6%
ValueCountFrequency (%)
0 105
58.7%
1 74
41.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 712
100.0%
ValueCountFrequency (%)
Decimal Number 179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 444
62.4%
1 268
37.6%
ValueCountFrequency (%)
0 105
58.7%
1 74
41.3%

Most occurring scripts

ValueCountFrequency (%)
Common 712
100.0%
ValueCountFrequency (%)
Common 179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 444
62.4%
1 268
37.6%
ValueCountFrequency (%)
0 105
58.7%
1 74
41.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 712
100.0%
ValueCountFrequency (%)
ASCII 179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 444
62.4%
1 268
37.6%
ValueCountFrequency (%)
0 105
58.7%
1 74
41.3%

Interactions

Train Dataset

2023-08-08T10:04:48.072489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:56.073845image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:42.542933image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:51.056381image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:44.168735image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:52.902873image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:45.586927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:53.939803image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:46.783797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:55.024432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:48.279999image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:56.256960image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:42.881929image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:52.097935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:44.433394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:53.103132image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:45.818443image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:54.138436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:47.038702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:55.211135image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:48.510741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:56.467239image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:43.259051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:52.321575image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:44.731148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:53.321742image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:46.056376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:54.357232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:47.279029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:55.451835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:48.742973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:56.705113image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:43.594391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:52.539384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:45.008979image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:53.547815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:46.310565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:54.597617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:47.578693image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:55.690748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:48.956966image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:56.910959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:43.905029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:52.718693image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:45.317632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:53.753459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:46.551852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:54.807435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

2023-08-08T10:04:47.829292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:04:55.869954image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

Train Dataset

2023-08-08T10:05:13.396103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Test Dataset

2023-08-08T10:05:13.677901image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Train Dataset

PassengerIdAgeSibSpParchFarePclassSexEmbarkedSurvived
PassengerId1.0000.027-0.0810.000-0.0080.0690.0420.0000.115
Age0.0271.000-0.189-0.2630.1210.2500.0720.0000.115
SibSp-0.081-0.1891.0000.4660.4600.1560.1850.0810.162
Parch0.000-0.2630.4661.0000.4170.0000.2460.0000.164
Fare-0.0080.1210.4600.4171.0000.4880.1880.1840.271
Pclass0.0690.2500.1560.0000.4881.0000.1220.2240.321
Sex0.0420.0720.1850.2460.1880.1221.0000.0760.538
Embarked0.0000.0000.0810.0000.1840.2240.0761.0000.154
Survived0.1150.1150.1620.1640.2710.3210.5380.1541.000

Test Dataset

PassengerIdAgeSibSpParchFarePclassSexEmbarkedSurvived
PassengerId1.0000.1020.0270.009-0.0360.1390.0000.0000.000
Age0.1021.000-0.152-0.2140.1930.2430.1460.0000.171
SibSp0.027-0.1521.0000.3870.3990.0000.3070.0900.256
Parch0.009-0.2140.3871.0000.3850.0000.2840.1090.169
Fare-0.0360.1930.3990.3851.0000.5200.2320.2770.368
Pclass0.1390.2430.0000.0000.5201.0000.1190.3630.384
Sex0.0000.1460.3070.2840.2320.1191.0000.2070.532
Embarked0.0000.0000.0900.1090.2770.3630.2071.0000.177
Survived0.0000.1710.2560.1690.3680.3840.5320.1771.000

Missing values

Train Dataset

2023-08-08T10:04:49.294304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.

Test Dataset

2023-08-08T10:04:57.256186image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.

Train Dataset

2023-08-08T10:04:49.724340image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Test Dataset

2023-08-08T10:04:57.694697image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Train Dataset

2023-08-08T10:04:50.027198image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Test Dataset

2023-08-08T10:04:57.978808image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Train Dataset

PassengerIdPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedSurvived
3313321Partner, Mr. Austenmale45.50011304328.5000C124S0
7337342Berriman, Mr. William Johnmale23.0002842513.0000NaNS0
3823833Tikkanen, Mr. Juhomale32.000STON/O 2. 31012937.9250NaNS0
7047053Hansen, Mr. Henrik Juulmale26.0103500257.8542NaNS0
8138143Andersson, Miss. Ebba Iris Alfridafemale6.04234708231.2750NaNS0
1181191Baxter, Mr. Quigg Edmondmale24.001PC 17558247.5208B58 B60C0
5365371Butt, Major. Archibald Willinghammale45.00011305026.5500B38S0
3613622del Carlo, Mr. Sebastianomale29.010SC/PARIS 216727.7208NaNC0
29303Todoroff, Mr. LaliomaleNaN003492167.8958NaNS0
55561Woolner, Mr. HughmaleNaN001994735.5000C52S1

Test Dataset

PassengerIdPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedSurvived
7097103Moubarek, Master. Halim Gonios ("William George")maleNaN11266115.2458NaNC1
4394402Kvillner, Mr. Johan Henrik Johannessonmale31.000C.A. 1872310.5000NaNS0
8408413Alhomaki, Mr. Ilmari Rudolfmale20.000SOTON/O2 31012877.9250NaNS0
7207212Harper, Miss. Annie Jessie "Nina"female6.00124872733.0000NaNS1
39403Nicola-Yarred, Miss. Jamilafemale14.010265111.2417NaNC1
2902911Barber, Miss. Ellen "Nellie"female26.0001987778.8500NaNS1
3003013Kelly, Miss. Anna Katherine "Annie Kate"femaleNaN0092347.7500NaNQ1
3333343Vander Planke, Mr. Leo Edmondusmale16.02034576418.0000NaNS0
2082093Carr, Miss. Helen "Ellen"female16.0003672317.7500NaNQ1
1361371Newsom, Miss. Helen Monypenyfemale19.0021175226.2833D47S1

Train Dataset

PassengerIdPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedSurvived
1211223Moore, Mr. Leonard CharlesmaleNaN00A4. 545108.0500NaNS0
6146153Brocklebank, Mr. William Alfredmale35.0003645128.0500NaNS0
20212Fynney, Mr. Joseph Jmale35.00023986526.0000NaNS0
7007011Astor, Mrs. John Jacob (Madeleine Talmadge Force)female18.010PC 17757227.5250C62 C64C1
71723Goodwin, Miss. Lillian Amyfemale16.052CA 214446.9000NaNS0
1061073Salkjelsvik, Miss. Anna Kristinefemale21.0003431207.6500NaNS1
2702711Cairns, Mr. AlexandermaleNaN0011379831.0000NaNS0
8608613Hansen, Mr. Claus Petermale41.02035002614.1083NaNS0
4354361Carter, Miss. Lucile Polkfemale14.012113760120.0000B96 B98S1
1021031White, Mr. Richard Frasarmale21.0013528177.2875D26S0

Test Dataset

PassengerIdPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedSurvived
3633643Asim, Mr. Adolamale35.000SOTON/O.Q. 31013107.0500NaNS0
97981Greenfield, Mr. William Bertrammale23.001PC 1775963.3583D10 D12C1
4174182Silven, Miss. Lyyli Karoliinafemale18.00225065213.0000NaNS1
5725731Flynn, Mr. John Irwin ("Irving")male36.000PC 1747426.3875E25S1
8528533Boulos, Miss. Nourelainfemale9.011267815.2458NaNC0
4334343Kallio, Mr. Nikolai Erlandmale17.000STON/O 2. 31012747.1250NaNS0
7737743Elias, Mr. DibomaleNaN0026747.2250NaNC0
25263Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)female38.01534707731.3875NaNS1
84852Ilett, Miss. Berthafemale17.000SO/C 1488510.5000NaNS1
10113Sandstrom, Miss. Marguerite Rutfemale4.011PP 954916.7000G6S1

Duplicate rows

Train Dataset

PassengerIdPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedSurvived# duplicates
Dataset does not contain duplicate rows.

Test Dataset

PassengerIdPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedSurvived# duplicates
Dataset does not contain duplicate rows.